Regulating Orthography-Phonology Relationship for English to Thai Transliteration

نویسندگان

  • Binh Minh Nguyen
  • Hoang Gia Ngo
  • Nancy F. Chen
چکیده

In this paper, we discuss our endeavors for the Named Entities Workshop (NEWS) 2016 transliteration shared task, where we focus on English to Thai transliteration. The alignment between Thai orthography and phonology is not always monotonous, but few transliteration systems take this into account. In our proposed system, we exploit phonological knowledge to resolve problematic instances where the monotonous alignment assumption breaks down. We achieve a 29% relative improvement over the baseline system for the NEWS 2016 transliteration shared task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A learning method for Thai phonetization of English words

This article tackles the problem of transcribing English words using Thai phonological system. The problem exists in Thai, where modern writing often composes of English orthography, and transcribing using English phonology results unnatural. The proposed model is totally data-driven, starting by automatic grapheme-phoneme alignment, modeling transduction rules and predicting Thai syllabictones...

متن کامل

Can the first letter advantage be shaped by script-specific characteristics?

We examined whether the first letter advantage that has been reported in the Roman script disappears, or even reverses, depending on the characteristics of the orthography. We chose Thai because it has several "nonaligned" vowels that are written prior to the consonant but phonologically follow it in speech (e.g., แฟน <ε:fn> is spoken as /fɛ:n/) whereas other "aligned" vowels are written and sp...

متن کامل

Syllable-Based Thai-English Machine Transliteration

This article describes the first trial on bidirectional Thai-English machine transliteration applied on the NEWS 2010 transliteration corpus. The system relies on segmenting sourcelanguage words into syllable-like units, finding unit's pronunciations, consulting a syllable transliteration table to form target-language word hypotheses, and ranking the hypotheses by using syllable n-gram. The app...

متن کامل

A Chunk-based n-gram English to Thai Transliteration

In this study, a chunk-based n-gram model is proposed for English to Thai transliteration. The model is compared with three other models: Table lookup model, decision tree model, and statistical model. The chunk-based ngram model achieves 67% word accuracy, which is higher than the accuracy of other models. Performances of all models are slightly increased when an English grapheme to phoneme is...

متن کامل

Hindi and Marathi to English NE Transliteration Tool using Phonology and Stress Analysis

During last two decades, most of the named entity (NE) machine transliteration work in India has been carried out by using English as a source language and Indian languages as the target languages using grapheme model with statistical probability approaches and classification tools. It is evident that less amount of work has been carried out for Indian languages to English machine transliteration.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016